MultiMatch: geometry-informed colocalization in multi-color super-resolution microscopy

With recent advances in multi-color super-resolution light microscopy, it is possible to simultaneously visualize multiple subunits within biological structures at nanometer resolution. To optimally evaluate and interpret spatial proximity of stainings on such an image, colocalization analysis tools have to be able to integrate prior knowledge on the local geometry of the recorded biological complex. We present MultiMatch to analyze the abundance and location of chain-like particle arrangements in multi-color microscopy based on multi-marginal optimal unbalanced transport methodology. Our object-based colocalization model statistically addresses the effect of incomplete labeling efficiencies enabling inference on existent, but not fully observable particle chains. We showcase that MultiMatch is able to consistently recover existing chain structures in three-color STED images of DNA origami nanorulers and outperforms geometry-uninformed triplet colocalization methods in this task. MultiMatch generalizes to an arbitrary number of color channels and is provided as a user-friendly Python package comprising colocalization visualizations.

should occur in the image.The maximal relative mean abundance of detected ABC triplets, which is attained for colocalization threshold t = 10 pixels is w ABD = 0.77.For MultiMatch Mode I it is closer to the (by experimental design known) truth of having triplets only, by reaching a maximal relative abundance of w ABC = 0.8 (see Figure 4).Supplementary Figure 2: MultiMatch Mode II relative abundance curves w(t) for experimental STED images.As Figure 4 but analysed with MultiMatch Mode II instead of Mode I: For each setting the solid curves are mean relative abundances with standard deviation bands across a range of colocalization thresholds t from 0 to 10 pixels (25 nm = 1 pixel).The abundances are scaled by the total number of points detected in channel B. Additionally, incomplete labeling efficiency (90% in each channel) corrected abundances are plotted as dotted curves.The true colocalization threshold of 70 nm within nanoruler structures is depicted as a vertical line.A. Setting 1 contains singlet (22 independent experimental STED images), Setting 2 triplet, pair and singlet (22 independent experimental STED images) and Setting 3 triplet nanorulers only (12 independent experimental STED images).

Comparison of methods across colocalization thresholds
We also tested the performances of considered colocalization methods across different colocalization thresholds t, see Supplementary Figure 3.  3: Simulation study for three-color microscopy images with three combinations of chain structures along different colocalization thresholds t.In each Scenario 100 independent STED images and different abundances of triplets, pairs and singlets were simulated with 100% labeling efficiency (Methods).Mean relative abundances curves (scaled by the total number of points in channel B) are shown per colocalization method and chain structure.True simulated relative abundances are plotted as transparent, horizontal dashed lines.A. In Scenario 1 only A,B and C singlets, in B. Scenario 2 all possible singlets as well as AB and BC pairs and in C. Scenario 3 ABC triplets, AB, BC pairs and A, B and C singlets were simulated.
Methods were evaluated on the colocalization grid t ∈ {1, 2, 3, 4, 5, 6, 7, 8} pixels.We experienced, that BlobProb and ConditionalColoc were not directly applicable to a batch of images at once.In particular, BlobProb requires the user to load every image separately into an ImageJ/Fiji Graphical User Interface, where parameters as the colocalization threshold have to be adjusted by hand, respectively.For the MATLAB implementation of ConditionalColoc, all images within a simulation scenario had to be combined into a 'movieList', which could then be input as a whole for colocalization analysis.However, the analysis had to be performed separately for each colocalization threshold.With the runtime of 2 minutes per image and colocalization radius, as reported in the ConditionalColoc manual (https://github.com/kjaqaman/conditionalColoc), the evaluation of our simulation study with ConditionalColoc took about 1000 times longer than with our MultiMatch implementations (0.1 seconds per image and colocalization radius, see Methods Section 4.2).

Comparison with Nearest Neighbor Matching
In addition to the considered colocalization methods provided as packages, plugins or executable scripts, one can also consider greedy Nearest Neighbor Matchings as a compatible algorithm to MultiMatch.Nearest Neighbor Matchings can be implemented in several ways depending on the order of points within a channel and the order of channels being matched in pairs.For the method comparison below, we used the following implementation: 1.For each point in channel A, assign it to its nearest neighbor in B as soon as their pairwise distance is smaller than the colocalization threshold t.We applied MultiMatch Mode I and II and the above described Nearest Neighbor Matching approach directly to the simulated point clouds without additional conversion to microscopy intensity images.To illustrate the differences between MultiMatch's global optimization procedure and the effect of local, greedy Nearest Neighbor searches, we chose to simulate settings with a high particle density and an especially high number of triplets: Scenario i: 1000 ABC triplets only.Scenario ii: 1000 ABC triplets and 1000 B singlets.Scenario iii: 1000 ABC triplets, 500 A, B and C singlets, respectively.
As in all other simulation scenarios, the pixel size was simulated as 1 pixel = 25 nm, the image size was set to 400×400 pixels and the true colocalization distance was fixed to t = 70 nm.For each setting 100 images were simulated, but we directly evaluate the coordinates of simulated point clouds without further translation into an intensity image nor simulation of microscopy noise or point spread function convolution (see examples in Supplementary Figure 4D).
As can be seen in Supplementary Figure 4A, in Scenario i the Nearest Neighbor Matching approach underestimates the number of ABC triplets and overestimates the abundance of AB and BC pairs, although only triplets were simulated.For the a colocalization threshold of 4 pixels = 100 nm, the mean abundance detected by the Nearest Neighbor Matchings only reaches around 939.43 of 1000 simulated ABC triplets with a standard deviation of ca 0.197, while in both MultiMatch Modes I and II all 1000 simulated ABC triplets are recovered for every simulated image.
In Scenario ii we can showcase a similar behavior: Nearest Neighbor Matchings only reach a maximal average abundance of 838.71 out of the 1000 simulated ABC triplets for t = 4 pixels.
Additionally, we can observe that this setting is also challenging Multi-Match Mode II: Similar to Nearest Neighbor Matchings, MultiMatch Mode II underestimates triplets due to the disproportionate abundance of Type B particles and the dense particle distribution.Mode II finds more AB pairs with lower overall pairwise particle distances and therefore on average misses around 15% of all simulated ABC triplets in maximal colocalization threshold.
Interestingly, the shape of abundance curves for the Nearest Neighbor Matching is very similar between Scenarios i, ii and iii altough in the first we only simulated triplets and in the latter two we added different types and ratios of singlets.The Nearest Neighbor Matching approach does not only underestimate triplet abundances but also shows no clear plateau to discriminate between different colocalization structures.MultiMatch Mode I abundance curves, on the other hand, stabilizes at the correct abundance for triplet abundances for Scenario i and ii.If a random distribution of more than one type of singlets is present in the image (Scenario iii), singlets are matched as soon as the colocalization threshold is high enough.Still, one can observe that the abundance curve slopes visibly drop for t ≥ 4 pixels.From the respective MultiMatch Mode II results of those two simulations scenarios, shown in Supplementary Figure 5, one draws that abundance curves again are stabilizing after a colocalization threshold of t = 4 pixels.In case the detection of chain structures, here ABCD quadruples, is aggravated by incomplete labeling efficiency, our estimation framework leads to a consistent improvement of detection results towards the simulated ground truth.

Additional Four-Color STED Simulation Scenarios
For simulation Scenarios IV and V it becomes obvious, that due to high number of particles in the image, the simulated noise and according point detection errors, MultiMatch can not recover all simulated quadruples.In this case, the interactive napari viewer 1 can help to evaluate the noise level, point detection performance and matching results.For uniform noise levels and point detection errors across all channels, we recommend to evaluate channel-wise scaled instead of absolute abundance curves.

Supplementary Figure 4 :
Simulation study for three combinations of chain structures along different colocalization thresholds t.Mean absolute abundances curves with standard deviation bands are shown per method and chain structure.True simulated relative abundances are plotted as transparent, horizontal dashed lines.For each scenario 100 independent images were simulated.A. In Scenario i only ABC triplets and in B. Scenario ii ABC triplets as well as B singlets were simulated.C. Scenario ii ABC triplets as well as A,B and C singlets were simulated.D. Representative particle clouds for Scenarios i-iii.
Supplementary Figure5: MultiMatch Mode II abundance curves w(t) and estimation results n(t) for simulated four-colour STED images.For each simulated scenario 100 independent images were simulated with complete labeling efficiency (s A = s B = s C = s D = 1) and with incomplete labeling efficiency (s A = s B = s C = s D = 0.95), respectively.Solid curves are mean absolute detected abundances with standard deviation bands across a range of colocalization thresholds t from 0 to 10 pixels (25 nm = 1pixel).Corrected abundances are plotted as dotted curves.A. Scenario III: Only singlets were simulated.B. For one exemplary STED image of Scenario III simulated with incomplete labeling efficiency, corrected abundance curves and corresponding confidence bands are shown.C. and D. show the same analysis as shown in A and B but for Scenario IV: ABCD quadruplets and A,B,C and D singlets were simulated.E. and F. show the same analysis as shown in A and B but for Scenario V: ABCD quadruplets only were simulated.G. Representative STED image for Scenario V with image details in the interactive napari viewer allowing a visual check of the image and MultiMatch output quality.For visualization purposes, contrast stretching and increasement of image brightness was applied.Supporting Section 2.5, we additionally simulated the following three four-color STED image scenarios: Scenario III: 50 A, B, C and D singlets, respectively, and no further quadruples, triplets nor pairs.Scenario IV: 200 ABCD quadruples and 100 A,B, C and D singlets, respectively.Scenario IV: 500 ABCD quadruples only.
If matched, do not consider the respective B point for further nearest neighbor searches of channel A points.The match is stored as AB pair.2. Repeat 1. to match points in channel B to their nearest neighbors in channel C. Respective matches are stored as BC pairs.3.If an AB and a BC pair share the same B point, they are re-annotated into one ABC triplet.